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(54) Method and computer system for software tuning 



(57) Method and computer system for software tun- 
ing. A computer system stores variables (210) for stor- 
ing at least one threshold value for at least one param- 
eter (PI ) influencing the perfomiance of a software ap- 
plication (200) with regards to a specific task. A thresh- 



old evaiuator (220) compares (430) the at least one 
threshold value to at least one corresponding current 
value allowing the software application (200) to select 
(440) an algorithm (A1 ) from a plurality of algorithms (A1 
to AN) for performing the task in accordance with the 
result of comparison. 
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Description 

Field of the Invention 

[0001 ] The present invention generally relates to elec- 
tronic data processing, and more particularly, relates to 
methods, computer program products and systems for 
software tuning. 

Background of the Invention 

[0002] Some software products (e.g. , application sys- 
tems, database systems, etc.) inciude parameter pro- 
files that can be set by specialists to achieve an optimal 
perfomiance of the software product in a given environ- 
ment of a computer system. The environment is deter- 
mined, for example, by the used hardware, operating 
system, network data transfer speed, and many other 
factors. There are cases where the specialist has to use 
a try and error procedure to detemilne the parameters 
where the software product performs best in the given 
environment. Typically, the parameters are part of a stat- 
ic configuration of the software product that is defined 
once. 

[0003] In the publication "Statistical Models for Auto- 
matic Performance Tuning" by Richard Vuduc et al., au- 
tomatic tuning systems are proposed that are based on 
search-based systems. The paper discloses a heuristic 
for stopping an exhaustive compile-time search eariy if 
a near-optimal Implementation Is found. Further, it 
shows how to construct run-time decision rules, based 
on run-time inputs, for selecting from among a subset 
of the best implementations. Complex statistical tech- 
niques are used to exploit a large amount of perfomi- 
ance data collected during a search. The run-time deci- 
sion rules can be costly so that the compile-time search 
may be preferable. 

Summary of the Invention 

[0004] One embodiment of the invention provides a 
simple mechanism for enabling a computer program 
that runs on a computer system to tune itself without hu- 
man interaction for achieving optimal system perfomi- 
ance in a given environment at njntime. This embodi- 
ment can be implemented according to the claims 1 , 8, 
9, and 1 7. An advantage of this embodiment is that sim- 
ple comparisons with threshold values are used for the 
selection of the most suitable algorithm for a specific 
task instead of complex statistical techniques. A further 
advantage lies in the ability to handle more-dimensional 
dependencies of the perfomiance of alternative algo- 
rithms for performing the task. 
[0005] Another embodiment provides a mechan ism to 
enable the computer program to dynamically adjust tun- 
ing parameters at runtime when the environment chang- 
es. This embodiment can be implemented according to 
the claims 2, 8, 9, and 18. This embodiment allows the 



software application to recalculate threshold values of 
multiple dimensions based on the actual perfomiance 
of the alternative algorithms. If appropriate, the software 
application can use the recalculated threshold values for 

5 future algorithm selection. 

[0006] In another embodiment of the invention a data 
storage system automatically switches between multi- 
ple data retrieval algorithms. This embodiment can be 
implemented according to the claims 1 0 and 1 6 and pro- 

10 vtdes a fast data retrieval mechanism in the presence 
of more than one parameter influencing the perfomi- 
ance of the data retrieval. 

[0007] The aspects of the invention will be realized 
and attained by means of the elements and combina- 

15 tio[)s particularly pointed out in the appended claims. Al- 
so, the described combination of the features of the in- 
vention is not be understood as a (imitation, and all the 
features can be combined in other constellations without 
departing from the spirit of the Invention. It is to be un- 

20 derstood that both the foregoing general description and 
the following detailed description are exemplary and ex- 
planatory only and are not restrictive of the Invention as 
described. 

^5 Brief Description of the Drawings 

[0008] 
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FIG. 1 

FIG. 2 
FIG. 3 

35 FIG. 4 

FIG. 5 

40 FIG. 6 

FIG. 7 
45 FIG. 8 
FIG. 9 
FIG. 10 

SO 

FIG. 11 
FIG. 12 

55 FIG. 13 
FIG. 14 



is a simplified block diagram of a computer 
system that can be used with an embodi- 
ment of the invention; 
illustrates initialising threshold values; 
illustrates dynamically adjusting threshold 
values in one dimension; 
illustrates threshold values and corre- 
sponding algorithms In two dimensions; 
is a simplified block diagram of an example 
of a data storage computer system that can 
be operated according to invention; 
is a diagram of a statk: hierarchical data 
structure used in one embodiment of the 
data storage system; 

schematically shows the initial state of an 
anchor as used in the data stmcture; 
Illustrates the use of the anchor for the Im- 
plementation of an InfoType; 
illustrates adding an InfoCell to the data 
structure; 

illustrates the structure that is obtained 
when multiple InfoTypes are put into the da- 
ta structure; 

shows an InfoCourse that contains data; 
illustrates multiple InfoCourse paths in the 
data structure; 

illustrates how to retrieve data from the data 
storage system when operated according 
to the invention; 

illustrates how two result sets can be 
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merged into a single result set when apply- 
ing the Boolean OR operator; 

FIG. 15 illustrates how two result sets can be 
merged Into a single result set when apply- 
ing the Boolean AND operator; 

FIG. 1 6 illustrates a first implementation for the re- 
sult flags and the result sets; 

FIGs. 1 7 illustrates a second implementation for the 
result flags and the result sets; 

FIG. 18. illustrates how result flags relate to con^e- 
spending IC-anchors In the second imple- 
mentation; 

FIG. 19 illustrates, how Boolean operators can be 
applied to the result sets in the second Im- 
plementation; and 

FIG. 20 is a simplified block diagram of software 
components of the computer system to dy- 
namically select a data retriever implemen- 
tation. 

Detailed Description of the Invention 

[0009] FIG. 1 shows a software application 200 as 
part of a computer system 990 that can be used with an 
embodiment of the invention. The software application 
200 uses parameter variables 210 that can be set to 
specific threshold values for a corresponding parame- 
ter. The threshold values can come from a parameter 
profile {e.g., for the second parameter PARAMETER 2) 
or they can be calculated by the software application 
210 (e.g., for the first parameter PARAIVIETER 1). For 
example, the software application can be a technical 
software application, such as a data storage system 
management application or it can be a business soft- 
ware application, such as an enterprise resource plan- 
ning application, or a customer relationship manage- 
ment application, or any other software application. 
[0010] The parametervariables 210 store information 
about the parameters that can be used to influence the 
perfomiance of the software application 200 with re- 
gards to a specific task. The parametervariables Pi to 
Pn will also be referred to as variables. The software 
application implements various algorithms A1 to AN for 
perfomiing the specific task in different ways. Further 
algorithms used for different tasks may be implemented 
in the software application. For example, specific tasks 
can be sorting data, retrieving data, filtering data or any 
other operation performed on data that may depend on 
a parameter that has influence on the performance of 
the specific task. Forexample, threshold values that can 
influence the perfomnance can be either hardware relat- 
ed parameters (e.g., the number of processors In the 
computer system, the available main memory of the 
computer system) or software related parameters (e.g., 
the main memory allocated to the software application, 
the size of the data volume, the number of hits of a query, 
or any other parameter value, that can influence the per- 
formance of the software application). Software related 



parameters can easily be modified by the software ap- 
plication itself, whereas hardware related parameter 
modification in general requires human Interaction (e.g., 
adding an additional blade In a blade server) In many 
s cases. 

[0011] The software application further implements a 
threshold evaluator 220 and a threshold calculator 230. 
[0012] In a first step the threshold calculator 230 is 
used to calculate 410 one or more threshold values of 

10 the first parameter that relates to the specific task. De- 
tails are explained under FIG. 2. The calculated one or 
more threshold values are stored 411 in corresponding 
variables (e.g., variable PI). In one alternative, multiple 
threshold values of the parameter are stored in one var- 

is iable (vector variable). In another alternative, for each 
threshold value a corresponding variable is used. 
[0013] The software application generates current 
values with respect to the parameters. For example, for 
perfomiing a sort function for data in a list, the current 

20 value can correspond to the length of the list. Athreshold 
value in the first variable PI indicates that for a current 
value below the threshold value the best system per- 
formance is achieved when using a first algorithm (e.g., 
A1) and for a current value above the threshold value 

2s the best system perfomriance is achieved when using a 
second algorithm (e.g., A2). In other words, each algo- 
rithm covers a corresponding value range where the use 
of the algorithm provides the best system perfomnance. 
That is, the one or more threshold values separate the 

30 value range of the first parameter PI into at least two 
intervals. 

[0014] The threshold evaluator 220 uses 420 the one 
or more threshold values PI when comparing 430 the 
cun^ent value with the one or more threshold values to 

35 determine the appropriate algorithm for performing the 
specific task with optimal performance. In the example, 
the first algorithm A1 is selected 440 from the plurality 
of algorithms A1 , A2, AN for perfomiing the specific task. 
The selection is in accordance with the result of the com- 

40 paring step 430. For example, the first algorithm A1 Is 
assigned to the interval that includes the current value. 
A1 is determined but any other algorithm A2 to AN might 
be selected (dashed arrows). This depends on the in- 
terval where the cun^ent value belongs to. 

45 [0015] Once the specific task has been perfomned, 
the actual perfomnance of the selected first algorithm A1 
can be measured 450 and a check is performed 460 
whether the measured perfomnance compiles with the 
one or more threshold values, that is, whether the as- 

so signment of the selected algorithm to the Interval includ- 
ing the current value delivers the best perfomnance with- 
in the plurality of algorithms. 

[001 6] The threshold calculator 230 uses the perform- 
ance measure and, in case the perfomnance measure 
55 does not comply with the cun^ent setting of the one or 
more threshold values for the first parameter PI , recal- 
culates 470 the one or more threshold values for the first 
parameter PI . The one or more recalculated threshold 
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values are then used to update 471 the corresponding 
variables 21 0. 

[0017] FIG. 2 illustrates initialising threshold values. 
Regarding the initial definition of threshold values, one 
alternative is to provide a profile parameter for each 
threshold value in a profile file for the software applica- 
tion 200. Profile files are commonly used, for example, 
for defining buffer sizes, time out parameters, hardware 
configuration parameters, or parameters for detemnin- 
ing software behaviour in specific situations, such as er- 
ror handling. For example, some parameters may influ- 
ence the perfomiance of a software application. Usually, 
the setting of profile parameters requires a specialist 
who is familiar with the architecture of the software and 
has a good feeling forthe way the software is influenced 
by the parameter settings. Furthennore, the specialist 
has to Icnow the value ranges of each parameter. In 
practice, It turns out that the more profile parameters are 
available, the less likely it is that the specialist will suc- 
ceed in tuning the software for optimal perfomnance, 
[0016] For this reason, an embodiment of the inven- 
tion can be used to reduce the number of profile param- 
eters that have to be manually set to a necessary mini- 
mum. Specialists working from outside the software 
may tune only parameters that depend on a specific use 
case or business scenario, where pre-tuning of the soft- 
ware is difficult. For example, consider the pre-tuning of 
relational database management systems, and In par- 
ticular deciding in advance which indexes to create, 
which depends on the final structure of the various ta- 
bles in the database system. 

[0019] In this embodiment of the invention, the soft- 
ware itself can tune scenario-independent parameters, 
such as the threshold values for the various algorithms 
A1 to An. The initial values may be set during start-up 
of the software by running predefined test cases for the 
various algorithms. 

[0020] The example of FIG. 2 Illustrates the automatic 
determination (initial calculation 410, cf. FIG. 1) of a 
threshold value with regards to two algorithms A1 (illus- 
trated by bullet points) and A2 (illustrated by circles). In 
the example, a parameter p is tuned to a series of dis- 
crete values between a pair of chosen extreme values. 
The spacing of values between the extremes need not 
be very fine and can be equidistant. For each value p 
(x), a measurement is made of the perfomiance differ- 
ence D(p{x)) (defined as runtime difference or any other 
suitable measure) between the two algorithms. For ex- 
ample, the difference may be defined as 0(p(x)) = PA1 
(p(x)) - PA2(p(x)). For each difference D(p(x)), either a 
single measurement is made or the average of several 
runs can be taken. 

[0021 ] Below the threshold value, the perfonnance of 
algorithm A1 decreases steadily with increasing value 
p(x) whereas the perfomnance of algorithm A2 increas- 
es. So the magnitude of the perfonnance difference D 
(p(x)) between the algorithms decreases but always has 
the same sign. For example, if D(p{x)) = PA1 (p(x)) - PA2 



(p(x)) , then the difference is positive as long as algorithm 
A1 has a better perfomnance than algorithm A2. 
[0022] The measured performance difference D(p(a)) 
for value p(a) is the last positive difference, so p(a) is 
5 the greatest value of p such that D(p) > 0. From value p 
(b) onward, the difference is negative, so p(b) is the least 
value of p such that D(p) < 0. The threshold value lies 
in the inten/al between values p(a) and p(b). 
[0023] An iteration, for example based on an interval 
10 bisection procedure, can be used to locate the threshold 
value within the interval [p(a), p(b)]. 
[0024] For the first step of the iteration , the value p(a) 
is the left interval border, p(b) is the right interval border 
and p(c) = [p(b) - p(a)] / 2 is the middle of the interval. 
IS A new measurement D(p(c)) of the perfonnance differ- 
ence between the two algorithms is made for value p(c). 
[0025] If the difference is positive, D(p(c)) > 0, then 
the threshold value lies in the right half-interval [p(c), p 

(b) I. 

[0026] If the difference is negative, D(p(c)) < 0, then 
the threshold value lies in the left half-Interval [p(a), p 

(c) ]. 

[0027] If the difference is zero, D(p(c)) = 0, then the 
threshold value is detemiined exactly by the parameter 
value p(c) and the iteration is complete. 
[0028] If the difference is greater than a predefined 
delta, the Iteration continues. The half-interval contain- 
ing the threshold value is subdivided Into two smaller 
half-intervals (which are either [p(c), p(d)J and [p(d), p 
(b)] or [p(a), p{d)] and [p(d), p(c)], depending on whether 
D(p(c)) is positive or negative) and the perfomnance dif- 
ference D(p(d)) is evaluated for the value p(d), and so 
on, as above. 

[0029] The procedure stops as soon as the threshold 
value has been Identified with suffrclent precision. This 
may depend on: 

[0030] The size of the measured difference D(p), for 
example, whether D(p) < delta, for some predefined 
minimal difference delta. 

[0031] The type of the parameter, for example, wheth- 
er p(x) is of type integer or floating point. 
[0032] In this way, this embodiment of the invention 
can calculate initial values for all threshold values during 
start-up . 

[0033] This start-up calculation can last for several 
milliseconds or even seconds before the software is up 
and njnning. 

[0034] However, the software tunes itself automatical- 
ly and optimally on the given environment (e.g., given 
hardware, operating system). It can be expected to do 
so more quickly, more exactly, and more inexpensively 
than a specialist could tune the software by manually 
setting profile parameters. 

[0035] FIG. 3 illustrates updating threshold values dy- 
namically during operation of the software application 
200 (cf. FIG. 1). There can be the plurality of algorithms 
A1 to AN for perfomning the same task, where each al- 
gorithm is used for a con^esponding interval. That is. 
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each algorithm has at least one threshold value that rep- 
resents a boundary of the corresponding interval. 
[0036] Assuming that the initial threshold values are 
correct at start-up and for some time after start-up, in 
the course of time, this situation may change, for exam- 
ple, because of memory fragmentation or accumulating 
memory leaks due to bugs in the coding or other rea- 
sons. Therefore, after a certain time the software can 
run under conditions that differ from those that prevailed 
Immediately after start-up. 

[0037] The performance of any of the algorithms A1 
to AN may degrade or improve by different amounts rel- 
ative to the other algorithms. Therefore, the con-espond- 
ing threshold values may shift in the course of time. 
[0038] This embodiment of the invention can auto- 
matically and regularly repeat its detennination of its 
threshold values, as specified in the above calculation 
(cf . FIG. 2), so as to adjust the threshold values used to 
switch algorithms dynamically during runtime. 
[0039] To revise its determination of threshold values, 
the software application 200 makes automatic perform- 
ance measurements 450 (cf. FIG. 1), for example, by 
using an appropriate time measuring component. For 
example, the measurements can simply be records of 
the time taken for certain tasks to run. The measure- 
ments can be made either on an ongoing basis or from 
time to time. 

[0040] When using ongoing performance measure- 
ments, the software application 200 measures the per- 
formance of each execution of an algorithm. 
[0041] If the current execution of algorithm A1 con^e- 
sponds to a current value p(x) that is below the threshold 
value B1, the performance PA1(p(x)) of algorithm A1 
should in general be better than its performance at the 
corresponding threshold value B1 , since this is the rea- 
son why the software 200 executes algorithm A1 instead 
of algorithm A2 (see FIG. 2). 

[0042] At the threshold value B1 , the performance of 
algorithm A1 is by definition the same as that of algo- 
rithm A2, that is, PA1(B1) = PA2(B1). 
[0043] In FIG, 3, for the algorithm An (1 < n < N) cho- 
sen in the interval between the two neighbouring thresh- 
old values B(n - 1) and Bn, the performance PAn(pc) of 
the algorithm An at the value pc (in the interval) should 
be eitherthe same as or better than its performance PAn 
(B(n - 1)) and PAn(Bn) at the upper and lower threshold 
values B(n) and B(n - 1). 

[0044] If the performance PAn(pc) of algorithm An 
(middle arrow) is below its performance PAn(B(n - 1)) 
(left arrow) or PAn(Bn) (right arrow) at a neighbouring 
threshold value (either B(n - 1 ) or Bn, respectively), then 
it is no longer advantageous to choose the algorithm An 
at the value pc. As a consequence, the checking step 
460 (cf. FIG. 1) concludes that the measured perfomri- 
ance for the algorithm does not comply with the current 
setting of threshold values. 

[0045] Therefore, in future, at parameter value pc, the 
software applrcation chooses an algorithm that was ear- 



lier measured as perfonming better in the neighbouring 
interval, which is either algorithm A(n - 1) or A(n + 1). 
This choice is equivalent to moving the threshold value 
B(n - 1) or Bn, respectiveiy, to a new position at value 
5 PC which corresponds to the recalculating step 470 (cf . 
FIG. 1). 

[0046] However, when algorithm A(n - 1) or A(n + 1 ) 
is next run at parameter value pc, it may be the case 
that the newly measured perfonnance of the algorithm 

10 is also reduced, possibly even more so than the per- 
fonnance of the original algorithm An. In this case, the 
threshold value B(n - 1) or Bn, respectively, should not 
have been moved to the new position pc. 
[0047] This situation may occur in practice because 

IS any reasons for the reduced performance of algorithm 
An may also apply to reduce the perfonnance of algo- 
rithm A(n - 1) or A(n + 1). 

[0048] Therefore, a situation like this can trigger a re- 
calculation of all the threshold values, either Immediate- 

^ ly or as soon as practically possible, for example, when 
the system load is sufficiently low. Alternatively, the soft- 
ware can generate a system message to warn an ad- 
ministrator that the latest perfonnance measurements 
Indicate the need for a recalculation of the threshold val- 

25 ues. 

[0049] When using performance measurements from 
time to time, the software recalculates 470 the threshold 
values preferably at times of low system load, in the 
same way that it does during starf-up for the initial cal- 

30 culation 410 (cf. FIG. 1). This recalculation may be de- 
fined as part of a bundle of housekeeping tasks that are 
perfomfied at regular time intervals by the software. In 
this case, the threshold values are adjusted with a lower 
frequency than when using the ongoing basis attema- 

35 live. 

[0050] Removing as many manually set profile pa- 
rameters as possible from a profile file and letting the 
software itself tune such parameters instead of a spe- 
cialist can lead to an improved performance over the full 
40 range of parameter values. 

[0051] However, In certain exceptional and rare situ- 
ations, there may be good reasons why such parame- 
ters should not be tuned by the software but from outside 
by a specialist. 

45 [0052] These exceptional cases can be handled as 
follows. By default, profile parameters that are tuned by 
the software itself do not appear In the profile file. How- 
ever, if an expert explicitly sets a parameter threshold 
value by entering it in the profile file, then the software 

so does not change the threshold value of this parameter. 
[0053] FIG. 4 illustrates threshold values and corre- 
sponding algorithms in two dimensions. The first dimen- 
sion is defined by the first parameter p as described in 
FIG. 3. The second dimension is defined by a second 

55 parameter p'. For the second parameter p', for example, 
three threshold values B'(n - 1), B'(n), and B'( n + 1) can 
be stored in the corresponding variables. There can be 
any number of further parameters defining further di- 
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mensions. 

[0054] In the example, for each value of the first pa- 
rameter p, two algorithms are available. In general, any 
number of algorithms can be available for each value of 
one dimension. For example, for the value pc the algo- 
rithms An and A'n can be used in the first dimension 
interval [B(n - 1),B(n)l to achieve optimal perfonnance 
regarding the first parameter p. Each algorithm Is rep- 
resented by a corresponding rectangle in the drawing to 
reflect the coverage of the two dimensions. However, 
which of the two algorithms provides the best perform- 
ance depends also on the second dimension. If the val- 
ue of p' Is In the second dimension interval [B'(n ■ 1), B' 
(n)], then the algorithm An is selected by the software 
application. If the value of p' is In the second dimension 
interval [B'(n), B'{n + 1)], then the algorithm A'n is se- 
lected. That is, in the case of multidimensional perform- 
ance dependencies the threshold evaluator compares 
a plurality of current values of various dimensions to a 
plurality of corresponding threshold values and selects 
the appropriate algorithm for the specific task that pro- 
vides the best perfonnance for the current combination 
of current values in the various dimensions. 
[0055] The threshold calculator can Initialise the 
threshold values of various dimensions by using the in- 
itialising procedure described under FIG. 2 for one di- 
mension while values of the further dimensions are kept 
constant during the performance measurement. 
[0056] In the following, an example for the software 
application 200 Is a database management software of 
a data (storage) system that can be used together with 
an embodiment of the invention. The data system may 
be implemented according to a relational database 
model. However, the system Is not limited to use within 
the constraints of a known relational database architec- 
ture. The elements of the data system roughly translate 
to the known nomenclature of the relational database 
theory as follows (with the definitions used with an em- 
bodiment of the invention on the left): 

InfoSystem <- Management System 
Info Area <- Database 
InfoCluster <- Table 
InfoType <- Attribute 
InfoCourse <- Data record 
InfoCell <- Field 

[0057] Further definitions of terms, as used hereinaf- 
ter: 

Boolean operators: 

operators used in Boolean statements, e.g., 
AND, OR. 

Relational operators: 

operators used in relational statements, e.g., 
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< (less than) 

<= (less than or equal to) 
> (greater than) 
>= (greater than or equal to) 
= (equal to) 
<> (not equal to) 

Condition: 

^0 relational statement comparing data, such as 

numerical data or alphanumeric data, using 
one or more relational operators. 

Boolean expression: 

15 

statement including multiple conditions that are 
combined using Boolean operators. 

[0058] FIG. 5 is a simplified block diagram of the com- 
puter system 990 that can be used with an embodiment 
of the invention. The computer system 990 includes 
multiple computing devices (e.g., first computing device 

901 and second computing device 902) that communi- 
cate over a networic 999, such as a local area networtc 
(LAN), wide area networi< (WAN), the Intemet, or a wire- 
less network. 

[0059] For example, the second computing device 

902 may be a backend system, such as a database sys- 
tem, a file system or an applrcation system, that stores 
data. The data can also be stored anywhere Inside or 
outside of the computer system 990. 
[0060] The first computing device 901 may be used 
to compose Boolean expressions 500 to be used in a 
QUERY for retrieving selected data from the second 
computing device 902. For example, the first computing 
device 901 may be a front end computer that provides 
a graphical user interface (GUI) to a user. 
[0061] There can be various ways in which the data 
storage system 902 receives the QUERY, dependent on 
the interfaces offered for the data storage system 902. 
For example, in case of using an SAP R/3 based sys- 
tem, the SAP Remote Function Call (RFC) functionality 
provided by the ABAP kernel can be used. An applica- 
tion programming interface (API) can be implemented 
as a collection of ABAP Function Modules. The API uses 
the RFC functionality to communicate remotely with the 
data storage system. An SAP R/3 based application us- 
es the API for receiving parameters that are passed to 
the data storage system 902. The con^esponding results 
are then retumed as ABAP parameters. A selection que- 
ry is filled into an internal table in ABAP and can be rap- 
idly processed by the data storage system since the 
query is already pre-structured. 
[0062] In general, any interface or meta fonnat can be 
used to post a Query to the data storage system. A pre- 
structured query is useful but not necessary. The query 
may also be coded in XML or simply be passed to the 
data storage system as a string that has to be parsed 
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within the data storage system. 
[0063] FlGs. 5 to 11 explain details of one embodi- 
ment of the data storage system 902. For example, as 
described in the patent application PCT/EP02/01026, 
the data storage system 902 can be configured as afast 
cache with all data structures residing in its main mem- 
ory. The Boolean expression 500 can include at least a 
first portion 501 and a second portion 502, each portion 
representing a selection condition of any degree of com- 
plexity applicable to the data structures in the main 
memory. Further portions may be included. The portions 
are combined through logical or relational operators 
(OP). 

[0064] FIG. 6 is a diagram of a static hierarchy struc- 
ture used in one embodiment of the data storage system 
902. Each box in the structure corresponds to an in- 
stance of the data type that is used as a label for the 
box. Multiple overlapping boxes illustrate multiple in- 
stances of the same data type. A single arrow between 
Instances of different data types stands for an arbitrary 
number of arrows between multiple instances at each 
corresponding level of the structure. In the following, the 
data type labels are used to refer to corresponding In- 
stances of the data type. The highest level in the struc- 
ture is the InfoSystem level. Down from the top level one 
or more InfoAreas are connected to the InfoSystem. The 
InfoSystem provides algorithms necessary to operate 
the data storage system in mn time. The InfoSystem is 
connected to any number of InfoAreas through a linking 
element, which will be described hereinafter as an an- 
chor. These InfoAreas can for example refer to logical 
units of the InfoSystem. 

[0065] Each InfoArea is connected via a linking ele- 
ment (again an anchor as described hereinafter) to an 
InfoCluster. In turn, each InfoCluster Is connected to at 

least one InfoCourse and at least one infoType, through 
respective linking elements, such as anchors. The Info- 
Type can be seen as an attribute of a table; an Info- 
Course starts always in an InfoCluster. If an InfoCourse 
stays within an InfoCluster wrth Its addressed InfoCell 
elements corresponding to fields of a table, then the In- 
foCourse is similar to a record of a table, such as a re- 
lational database table. 

[0066] Under the InfoCourse and the InfoType the In- 
foCell Is found; this is the element on the lowest level in 
the hierarchical structure. On the creation of an InfoType 
an anchor Is created that is an InfoCell also. This anchor 
has the function to represent the structure of following 
InfoCell elements. 

[0067] For the Implementation of the levels below the 
InfoArea level, i.e. the InfoCluster, the InfoCourse, the 
InfoType, and the InfoCell levels, use is made of a data 
element according to the invention as shown in FIG. 7. 
In this example, the data element Is shown schemati- 
cally as an anchor, and Is provided with a number of 
pointers. The pointers of the first pair are labelled LVR 
and RVR (Left Vertical Ring, respectively Right Vertical 
Ring), the pointers of the second pair are labelled LHR 
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and RHR (Left Horizontal Ring, respectively Right Hor- 
izontal Ring), the pointers of the third pair are labelled 
LSR and RSR (Left Self Ring, respectively Right Self 
Ring), and the single pointer is labelled IF (InFormation 
5 bridge). Note that the pointers LSR, RSR and IF are in 
principle optional. 

[0068] Further pointers my be used. In the initial state, 
as shown in FIG. 7, all pointers point to the anchor. This 
initial state Is also the simplest of possible ring struc- 
10 tures. Every pointer In the structure has a valid address, 
and cases of a non defined pointer (nil pointer) are 
avoided. 

[0069] In the following, example data Is used as 
shown in table A of FIG. 12. The table includes data re- 
is garding first names, ages and weights. Forthls table an 
InfoCluster is generated. Furthemriore, three InfoTypes 
are generated to represent respectively first names, ag- 
es, and weights. 

[0070] FIG. 8 illustrates the use of the data element 
for the implementation of the InfoType. In the InfoType, 
semantic infomriation is included, such as, the data type 
(in this example "INTEGER"), field name (in this exam- 
ple "age")), etc. The InfoType has an anchor associated 
with the InfoType. The anchor points with its RVR point- 
er to the actual Information earner, that Is the InfoCell. 
The InfoCell is as described above the lowest level entity 
within the data system. The InfoCell holds the informa- 
tion, as shown In FIG. 8; In this example "age is 30 In 
INTEGER". 

[0071] As described above, the InfoCell Is provided 

with a LVIR/RVR pointer pair. As shown in FIG. 8, the 
RVR pointer of the InfoCell points towards the anchor, 
and also the LVR pointer points to the anchor. As a re- 
sult, the ring configuration of the anchor Is maintained. 
[0072] FIG. 9 Illustrates how a further InfoCell Is add- 
ed to the data structure. The InfoCell (with the value 
"25") is inserted in the LVR ring after the first InfoCell. 
The LVR and RVR pointers of the InfoCell point to the 
anchor, as to maintain a closed ring. 
[0073] The order in which the tnfoCells are organized 
depends on their value. In case of a smaller value, the 
InfoCell is ordered in on the LVR side, othenvise on the 
RVR side. This practise is well known in the art as binary 
tree building. Preferably, the binary trees are organized 
as balanced or AVL trees, methods which are well 
known in the art. These kinds of trees minimise the 
number of levels within the tree structure, so as to min- 
imize access time. Preferably, all tree structures within 
the data system are dynamically balanced in use, so as 
to guarantee optimum access times. 
[0074] FIG. 1 0 illustrates the structure that is obtained 
when all InfoTypes of the table A are put into the data 
structure. In total, three InfoTypes are present; age, first 
name, and weight. Note that the end pointers of each 
last element In the respective trees are not shown. Un- 
der each anchor of the InfoType, the InfoCells are or- 
ganised in a binary tree. The InfoCluster points to an 
anchor which in turn points to a first InfoType. The first 
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InfoType In turn points to the other two InfoTypes. Each 
InfoType points to an anchor. The anchor has the addi- 
tional function of a marlcer, that can be used by an ac- 
cess or query process as a break or return sign. 
[0075] To complete the Implementation of the table, 
the relations between the InfoType have to be made. To 
this end an InfoCourse is introduced. 
[0076] FIG. 1 1 shows the InfoCourse that contains the 
data for a row of the table A. Use is made of the LHR 
and RHR pointers. The end pointers again point back to 
the anchor of the InfoCourse to maintain the ring struc- 
ture. Note that the InfoCourse also fonms a binary tree, 
sorted by the ID numbers of the InfoTypes. Note that the 
ID numbers of the InfoTypes are unique. For example, 
integer values are used for the ID numbers. 
[0077] FIG, 1 2 illustrates all the InfoCourse paths (for 
example implemented using pointers) for the table A. 
Note that all InfoCells have been provided in the top sec- 
tion with their respective InfoType Id number, over which 
the binary tree configuration of the InfoCourse via the 
LHR/RHR pointers Is organized. Elements that belong 
to an InfoType are connected by solid arrows. Elements 
that belong to an InfoCourse are connected by dashed 
arrows. 

[0078] When five million records with 100 attributes 
(e.g., 100 columns of a relational database table) are 
loaded into the data storage system 902, then five mil- 
lion InfoCourse trees (InfoCourses) exist, one for each 
record. Each InfoCourse includes 100 nodes. Each In- 
foCourse has a corresponding InfoCourse anchor point- 
ing to the respective InfoCourse. In other words, when 
loading five million records Into the data storage system 
902 then also five million InfoCourse anchors exist. 
[0079] FIGs. 1 3 to 1 5 explain a data retrieval mecha- 
nism as an example of a specific task that can be imple- 
mented by multiple algorithms. FIG, 16 explains a first 
implementation of the data retrieval mechanism and 
FIGs. 17 to 19 explain a second implementation. Each 
implementation is suitable for a corresponding parame- 
ter value range (number of hits). 
[0080] FIG. 13 illustrates how a computer implement- 
ed method can be used to retrieve data from the data 
storage system 902 when operated according to the in- 
vention. It is assumed that the data storage system 902 
stores the data using the data structure as described in 
FIGs. 2 to 8. Note that in this data structure each Info- 
Course 300. 301, 302, 303 has an InfoCourse anchor 
310,311,312,313. 

[0081] Once the Boolean expression is received by 
the data storage system 902, a parser decomposes the 
Boolean expression 500 Into the first portion 501 and 
the second portion 502. If further portions are included 
they are also subject to decomposition. Each portion in- 
cludes at least one condition that has to be fulfilled by 
any InfoCourse that is selected by the original query. 
The conditions relate to InfoTypes. 
[0082] The data storage system 902 then determines 
a result set for each portion. In the example, a first result 
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set 361 includes result flags (C-FLAG1) in compliance 
with the first portion 501 and a second result set 362 
includes further result flags (C-FLAG2) in compliance 
with the second portion 502. A result flag is used to in- 
5 dicate whether a specific InfoCourse fulfils a condition 
in a corresponding portion. Each result flag relates (bold 
up-arows) to a result identification number 351 , 352 of 
the corresponding result set 361 , 362, where it belongs 
to. The result flags within a result set are also intenrelat- 
10 ed (dashed arrows). Further, each result flag relates 
(bold left-arrows) to the corresponding InfoCourse an- 
chor (IC-anchor) 310, 311, 312, 313 of the InfoCourse 
fulfilling the corresponding condition. 
[0083] The two result sets 361 , 362 can originate from 
IS the evaluation of a complex Boolean expression, where 
the first result set 361 can be the result of one bracket 
including potentially any Boolean sub-expression as 
sub-query. The same is true for the second result set 
362, e.g. representing another bracket of the Boolean 
expression. 

[0084] FIGs. 1 3 and 14 illustrate how the first and the 
second result sets 361 , 362 can be merged into a single 
result set 363 when applying con^espondlng Boolean 
operators to the result flags of the corresponding Info- 
Course anchors. One Implementation of a data retrieval 
algorithm using pointer lists is explained in more detail 
in FIG. 16. Another implementation using bitmaps is ex- 
plained in more detail In FIGs. 17 to 19. In the example 
of FIG. 14, the Boolean expression 500 combines the 
first and second result sets with a Boolean OR operator. 
[0085] Forthe combination, the InfoCourses orthelC- 
anchors are not needed anymore. The number of result 
flags in each result set is known by, for example, incre- 
menting a corresponding counter when creating the re- 
sult flags. 

[0086] In one implementation, the data storage sys- 
tem runs through one of the result sets from the first to 
the last result flag. Advantageously, the result set includ- 
ing the lowest number of result flags is chosen because 
of a shorter processing time, which becomes more rel- 
evant in the case of Boolean AND combinations. The 
first result set 361 includes three result flags (C-FLAG1 ) 
and the second result 362 set includes two result flags 
(C-FLAG2). Therefore, the data storage system starts 
with the second result set 362 and then processes the 
first result set 361 . In one implementation, for each IC- 
anchor where a result flag C-FLAG1 orC-FLAG2 relates 
to, a corresponding result flag R-FLAG is generated in 
the third result set 363 with having result identification 
number 353. 

[0087] In another implementation, one can use also 
the first or second result set for storing the result of the 
Boolean OR operation. 

[0088] For example, when running through the sec- 
ond result set 362, each C-FLAG2 can be "renamed" 
into C-FLAG1. The result ID 352 of the second result 
set is set to the result ID 351 of the first result set. 
[0089] Further, it is checked whether a corresponding 
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C-FLAG1 result flag exists. If not, the data storage sys- 
tem proceeds with the next C-FLAG2 in the second re- 
sult set 362. If a corresponding C-FLAG1 exists, then 
one of these two result flags, either in the first or in the 
second result set, is deleted to avoid intersections. 
[0090] To find out, whether a corresponding C-FLAG1 
exists in the first result set 361 , for exanfiple, the data 
storage system moves along a circular structure that is 
used to relate the result flags to their corresponding IC- 
anchor. 

[0091] After having processed all result flags of the 
first and second result sets, only C-FLAG1 result flags 
remain. The combination with OR means to link the two 
result sets together to one result set. In this example, In 
the end, all result flags have the result ID 351 of the first 
result set 361. During the above described procedure 
the Counters for the number of result flags in each result 
set are continuously updated (e.g., decremented when 
result flags are deleted). Therefore, the number of result 
flags in the linal" result set is the sum of the counters 
of the first and second result sets just prior to lining them 
together. This count result can be reported to an appli- 
cation as the number of InfoGourses (records) matching 
the first portion 501 or the second portion 502 of the 
Boolean expression 500. 

[0092] The "final" result set may represent a real final 
result set or an Intemiediate result when the Boolean 
expression 500 includes further portions. In this case, it 
Is combined again with further result sets that con^e- 
spond to the further portions. A complex query consist- 
ing of several nested sub-queries may be evaluated re- 
cursively by combining the result sets of sub-queries 
with the result sets of other sub-queries. This continues 
until all levels of the Boolean expression are resolved. 
At the end, a single result set (e.g., result set 363) is left. 
Its number of result flags corresponds to the number of 
hits for the whole query (Boolean expression 500). 
[0093] In the example of FIG. 1 5, the Boolean expres- 
sion 500 combines the first and second result sets with 
a Boolean AND operator. 

[0094] Again, the data storage system knows the 
number of result flags in each result set from the conre- 
sponding result counters and starts with processing the 
result set with the lowest number of result flags. This is 
advantageous in the case of Boolean AND combina- 
tions because the total number of result flags can only 
be as large as the smallest result set. In the example of 
FIG. 13 the second result set 362 is the smaller one. 
[0095] For each IC-anchor, where a result flag 
C-FI^G1 and also a result flag C-FLAG2 relate to, a 
corresponding result flag R-FLAG Is generated In the 
third result set 363. 

[0096] In one implementation, one can use also the 
first or second result set for storing the result of the 
Boolean AND operation. 

[0097] For each result flag C- FLAG2 of the second re- 
sult set 362, the data storage system checks whether a 
corresponding result flag C-FLAG1 exists In the first re- 



sult set 361. If so, the result flag G-FLAG2 is the data 
storage system proceeds with the next result flag of the 
second result set. If no con^espondlng result flag 
C-FLAG1 is found in the first result set, then the result 

5 flag C-FLAG2 in the second result set 362 Is deleted. 
[0098] At the end of this filtering procedure, the sec- 
ond result set 362 includes the "final" result set and, 
therefore, plays the role of the third result set 363. The 
first result set 361 is not needed any more and can be 

10 deleted. 

[0099] Again, with each deletion of a result flag, the 
con-esponding counter is reduced accordingly. There- 
fore, the counter of the second result set always con- 
tains the current number of result flags C-FLAG2, which , 
IS at the end of the filtering procedure, corresponds to the 
number of hits for the query and may be reported to an 
Application. 

[0100] As in the Boolean OR case, the "final" result 
set may represent a real final result set or an intennedj- 

20 ate result when the Boolean expression 500 includes 
further portions that are subject to further combinations. 
[0101] FIG. 1 6 illustrates afirst implementation for the 
result flags G-FLAG1 , C-FLAG2 and the result sets. 
[0102] In this first Implementation, the data storage 

25 system Instantiates an instance (C-FALG1 , C-FLAG2) 
of a result flag class for each result flag. Multiple result 
flags for one InfoCourse 300 (record) are connected in 
a ring structure 800. The ring stru^ure 800 relates 330, 
320 to the corresponding IG-anchor 310. Advanta- 
ge geously, a docket element (D-FLAG) is used. The dock- 
et element represents a counterpart of the IC-anchor 
31 0 on the side of the result flags. One advantage is that 
the docket element is decoupled from the IC-anchor In 
the sense that rt is derived from a different class than 

35 the IC-anchor Therefore, It can provide different func- 
tions than the IC-anchor. These functions can be used 
by the other result flags because the docket element Is 
instantiated from the same class as the result flags. The 
decoupling allows instances from the result flag class to 

^ consume less memory than a corresponding IC-anchor 
that has, for example, more pointers, additional admin- 
istrative information (e.g., number of elements In a sub- 
structure), methods that operate on attributes, such as 
"sort elements" or "balance tree", etc. The docket ele- 

^5 ment D-FLAG has a docket pointer 330 pointing at the 
corresponding IC-anchor 310, whereas the IC-anchor 
310 has an anchor pointer 320 pointing at the corre- 
sponding docket element. Using tiie ring structure 800 
the data storage system can quickly identify any result 

50 flag related to a specific IC-anchor 

[0103] To summarize, each InfoCourse has one IC- 
anchor that relates to a corresponding docket element. 
That is, an InfoCourse (record) 300 is represented by 
an IC-anchor 310 and the corresponding docket ele- 

55 ment D-FLAG. The docket element is the docking point 
for the result flags C-FLAG1 , C-FLAG2. A result flag se- 
mantical ly plays the role of a dynamic flag. If a result 
flag is connected to a docket element, the InfoCourse, 



9 



17 

which is represented by the docket element, has been 
selected. That is, It fulfils one ore more conditions of the 
original query. 

[0104] Multiple result flags that relate to different IC- 
anchors may be linked together in a pointer list by 
means of pointers (e.g., pUp and pDown). This is also 
valid for the docket elements, since technically speaking 
they are also Instances of the result flag class. A linear 
list of result flags is called a result set. Each result set 
is Identified by a result ID. A result set flags a subset of 
InfoCourses that comply with at least portion of the 
Boolean expression 500 in the query. 
[01 05] In the example, the first result set 361 Is imple- 
mented by the first pointer list PL-1 that includes the re- 
sult flag pointers C-FLAG1 and has the result ID 351. 
The second result set 362 Is implemented by the second 
pointer list PL-2 that includes the result flag pointers 
C-FLAG2 and has the result ID 352. The docking ele- 
ments fonmally are also linked in a pointer list PL-D hav- 
ing its own result ID 350. 

[0106] Several result sets may exist simultaneously. 
On the level of a docket element D-FLAG, the result 
flags can be linked in the circular structure 800 using 
pointers, such as pointer pSmalild and pointer pLargeld. 
The pointer names indicate that the result flags in the 
circular structure 800 can be sorted by result ID. The 
circular structure 800 can be run through in both direc- 
tions, e.g. to find the result flag of a particular result set. 
Sorting the result flags in the circular structure BOO by 
result ID helps to decide In which direction the circular 
structure should be searched for a fast identification of 
a certain result ID. 

[0107] FIGs. 13and 14 describe an implementation 
for applying the Boolean OR and AND operators to two 
result sets. These operators may be combined with a 
Boolean NOT operator. In this case, the data storage 
system runs through the docket elements of all IC-an- 
chors and Instantiates a result flag in a new result set 
each time when there is no result flag in the original re- 
sult set where the NOT operator is applied to. At the end 
of the procedure the original result set is not used any 
more and can be deleted. Note that the InfoCourses are 
not needed to perfonn the inversion. Only the IC-an- 
chors are used. 

[0108] The number of hits as well as some or all of the 
InfoCourses that match the query may be returned to 
an application. As an example, assume that 20.000 In- 
foCourses are found. That is, the final result set contains 
20.000 result flags. If an application requests the next 
20 InfoCourses after the 5.390th InfoCourse from the 
data storage system, then the request can be satisfied 
by using the final result set. The result flag 5.390 (offset) 
is located by running down the final result set and count- 
ing the result flags until the offset result flag at position 
5.390 Is reached. The next 20 InfoCourses are read 
from the con^esponding tree structures (e.g., by using 
the IC-anchors that relate to the corresponding docket 
elements). The retrieved values may be serialized, for 
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example, into a Send-Buffer-Structure or any other kind 
of appropriate communication data structure. Any type 
of transport format and/or rearrangement, concatena- 
tion, etc. of data may be used for the Send-Buffer-Struc- 
5 ture (e.g., the use of fixed lengths). Preferably, the ap- 
plication knows the data format provided by the data 
storage system to ensure stable communication. 
[0109] For a fast localisation of a specific InfoCourse 
(e.g., number 5.390) it is useful to subdivide a result set 
10 into Inten/als. One can use an interval pointer which 
points to the result flag in the middle of the result set (e. 
g., result flag 10.000 of 20.000) or to any other sub-in- 
terval of the result set, such as quarters. According to 
the offset requested by the application the data storage 
15 system can jump to the nearest interval pointer and then 
sequentially run through only a part of the result set (e. 
g., upwards or downwards) and count until the request- 
ed result flag (e.g., docket element D-FLAG) has been 
reached. It is useful to choose the direction having the 
shortest distance to the requested offset position. For 
example, assume that there are 20.000 result flags in 
the result set. If InfoCourse 15.390 Is requested as an 
offset and no interval pointers are available, then it is 
advantageous to start at the bottom of the result set 
(result flag 20.000) and run through 20.000 - 15.390 + 
1 =4.611 result flags instead of starting at the top and 
run through 15.390 result flags. The same is true when 
using interval pointers. 

[01 1 0] For example, the above describe implementa- 
tion may con^espond to the algorithm An (cf. FIG. 4). To 
achieve the best data retrieval performance, it can be 
advantageous to use this algorithm An when the current 
value of the number of hits (e.g., pc; cf . FIG. 4) is below 
the threshold value B(n) (cf. FIG. 4) in the "number of 
hits" dimension and the compiexity of the Boolean ex- 
pression is in the interval [B'(n - 1), B'(n)] of the second 
dimension. The second dimension parameter p' in FIG. 
4, therefore, reflects the complexity of the Boolean 
statement In this example. The threshold value B'(n) 
may be defined through a Boolean AND expression in- 
cluding a single condition portion ("single-condition- 
Boolean-AND"). 

[01 1 1] In the previously explained general result flag 
instance based implementation, the Boolean AND op- 
erator was applied to two result sets. In another imple- 
mentation a "Lean AND" can be implemented in case 
only one result set exists as a result of one portion of 
the Boolean expression and this result set is to be com- 
bined with a single condition through a Boolean AND 
operator (Boolean AND expression). The Query may 
have a syntax like: (<compiex Subquery>) AND condi- 
tion CI . Also for multiple non-nested conditions com- 
bined with AND at the same bracket level the "Lean 
AND" can be used. A syntax example for this kind of flat 
Boolean expression Is: C1 AND C2 AND ... AND Cn, 
where Cn are conditions. 

[0112] This "Lean AND" Implementation may, for ex- 
ample, correspond to the algorithm A'n (cf. FIG. 4). To 
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achieve the best data retrieval performance, It can be 
advantageous to use the algorithm A'n at the "number 
of hits" value pc when the complexity of the Boolean 
AND expression Is below a certain threshold value (e. 
g., in the second dimension interval [B'(n), B'(n+1)], cf. s 
FIG. 4). 

[01 13] In the above examples, only one result set ex- 
ists and one or more conditions are to be combined with 
the Boolean AND operator. 

[0114] Assume a Boolean expression, such as: io 

C1 ANDC2AND... ANDCn. 
[01 15] As explained before, the data storage system 
902 Is able to quickly find out which of the conditions CI 
to Cn has the lowest number of hits, that Is, the highest 
selectivity. The total number of hits in the intersection is 
set of all conditions cannot be larger than the number of 
hits for the condition with the highest selectivity. 
[0116] Therefore, the data storage system creates a 
result set for the condition with the highest selectivity, 
then runs through all result flags of the result set and 20 
checks for each result flag if the remaining conditions 
are fulfilled by the corresponding InfoCourse or not. In 
this implementation the infoCourses are needed to 
check, for example, a condition NAME_FIRST= 'Peter*. 
The data storage system uses the relation from the re- 
suit flag through the docking element to the correspond- 
ing IC-anchor, which points at the conrespondlng Info- 
Course. The corresponding InfoCourse tree is then 
searched for the InfoType values according to the re- 
maining conditions. 30 
[0117] In this implementation, a second result set is 
not needed to be checked against the resu It set because 
the checking Is directly perfomied on the related Info- 
Courses. As a consequence, the time to Instantiate all 
the result flags of a second result set Is saved by directly 35 
searching the InfoCourses matching the result set (al- 
ready the most selective Condition) and checking direct- 
ly if the con-esponding values match or not. 
[01 18] For each result flag this check is performed for 
one or more conditions. For example, in a query C1 AND ^0 
02 AND C3 AND C4, a result set is Instantiated for the 
most selective condition and for each result flag the 
three other conditions are checked accordingly. If at 
least one condition does not match, the corresponding 
result flag Is deleted from the result set and the result 
counter is adjusted accordingly. 
[0119] Finally, the result set flags all matching Info- 
Courses (records) and the result counter has the correct 
number of hits, which may be reported to an application. 
[0120] For example, the "Lean AND" implementation so 
can be advantageous when the time to instantiate the 
result flag instances of the second result flag exceeds 
the time to check the corresponding InfoCourses. 
[0121] If the Boolean expression has only one single 
condition, result sets are not necessary. Result sets be- ss 
come valuable in case of a combination of several con- 
ditions in a corresponding query. In the particular case 
of only one condition the count result for the total number 
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of hits is obtained by means of the tree structures as 
described in FIGs. 5 to 11. If InfoCourses have to be 
returned to an application, this can also be done by us- 
ing the tree structures. Instead of a result set identifying 
the matching InfoCourses the tree nodes as such are 
used. For example. Instead of running through a result 
set to visit all matching InfoCourses and collect the data 
into a Send-Buffer-Structure, only the matching sub-tree 
structures identified by the corresponding start pointers 
are traversed. As soon as the offset InfoCourse is found, 
only the number of InfoCourses has to be visited that is 
to be returned. For example, when 1 0 InfoCourses have 
to be returned to an application, only 1 0 nodes from the 
offset InfoCourse on have to be traversed in the corre- 
sponding InfoType sub-tree. From each node in an In- 
foType tree the corresponding anchor object can be 
reached, and from the anchor each attribute value of the 
given InfoCourse can be reached. 
[01 22] FIGs. 1 6 to 1 8 illustrate a second implementa- 
tion for the result flags C-FLAG1 , C-FLAG2 and the re- 
sult sets leading to a second algorithm for the data re- 
trieval task. 

[0123] This second implementation is appropriate for 
very large result sets, where the first implementation 
would require many result flag instances eating up a lot 
of memory space of the data storage system. 
[0124] FIG. 17 illustrates three bitmaps BM-n, BM- 
n+1 , BM-n'f2. In the example, the start of bitmap BM-n 
coincides with the basis address of bitmaps in the mem- 
ory of the data storage system. 
[0125] For example, a first bitmap BM-n corresponds 
to the first result set 361 and a second bitmap BM-n+2 
corresponds to the second result set 362. The result 
flags C-FLAG1 , C-FLAG2 are implemented as bits In the 
respective bitmaps. 

[0126] A bitmap in general consists of multiple ma- 
chine words. Depending on the hardware architecture 
of the data storage system, a machine word can consist, 
for example, of 32 or 64 bits. The second implementa- 
tion also works with any other machine word length, 
such as 128 bit or more. A bitmap is a contiguous con- 
catenation of machine words in a sufficiently large area 
of the data storage system memory. The number of bits 
In a (result set) bitmap corresponds to the number of IC- 
anchors of the InfoCourses selected by the Boolean ex- 
pression 500. Therefore, each bitmap has the maximum 
size of a result set. Multiple bitmaps can simultaneously 
exist In the memory. Each bitmap (resu It set) is Identified 
by a corresponding result ID. Each result ID points to 
the start address of Its corresponding bitmap. For ex- 
ample, the first result ID 361 and the second result ID 
352 point to the start address of the first bit map BM-n 
and the second bit map BM-n+2, respectively. 
[0127] Assume, 5 million records (InfoCourses) are 
loaded Into the tree structures of the data storage sys- 
tem. Therefore, 5 million IC-anchors exist and, in this 
example, one bitmap includes 5 million bits (one Bit per 
IC-anchor). The bitmap occupies 5.000.000 / 8 = 
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625.000 bytes = 610 KB. The 5 million bits correspond 
to 5.000.000 / 64 = 78.125 machine words on a 64 Bit 
hardware platform and to 5.000.000 / 32 = 1 56.250 ma- 
chine words on a 32 Bit hardware platform. This exam- 
ple shows that a bitmap can be made up of tens of thou- 
sands or even more machine words. The number of ma- 
chine words in a Bitmap is only physically limited by the 
size of the available main memory and the addressabil- 
ity of the main memory. 

[01 28] That is, each Bitmap may consist of a theoret- 
ically unlimited number of machine words, where the 
length of a machine word depends on the given hard- 
ware platfomn and/or the operating system of the data 
storage system. Each bitmap is referenced by a result 
ID. Preferably, the result IDs 351, 352 are stored In a 
tree structure allowing direct access to the start address 
of the corresponding bitmap via a pointer. In general, 
any structure (e.g., a linear list) can be used to admin- 
istrate the result IDs. However, for large numbers of re- 
sult IDs the access to a specific result set Is more effi- 
cient when using a tree structure than when using a lin- 
ear list or another structure, 

[0129] When the start address of a specific bitmap 
has been found, this specific bitmap can be used to 
count the number of hits (number of result flags) or to 
return data to an application. 

[0130] In the second implementation, each bitmap 
has a counter counting the Number of result flags, that 
is, the number of bits set to 1 . To count the number of 
hits, the data storage system can run through all ma- 
chine words of the bitmap. If a machine word has a value 
of zero, then all bits of the machine word are zero and 
the next machine word can be checked. For machine 
words having a value different from zero the data stor- 
age system determines the number of bits that are set 
to 1 . This can be achieved by known methods, such as, 
shifting the bits of a machine word into one direction, 
testing with bit masks performing a bit by bit AND oper- 
ation, etc. Each time a bit is set to 1 , the counter is in- 
creased by 1 . At the end of the procedure the counter 
value corresponds to the number of bits set to 1 and, 
therefore, the number of result flags in the correspond- 
ing result set. 

[0131] FIG. 1 8 illustrates how resultflags relate tocor- 
responding IC-anchors in the second Implementation 
using memory mapping. The shown bits in the bitmap 
memory represent only a portion of the bitmap memory 
area. 

[0132] In contrast to the first implementation, bits of a 
bitmap and the corresponding IC anchors are not linked 
by pointers. 

[0133] However, IC-anchors and their con-esponding 
bits are related by a memory mapping rule using relative 
addresses. A memory manager of the data storage sys- 
tem can ensure that the IC-anchors and the bitmaps re- 
side in contiguous memory areas. The data storage sys- 
tem can then locate any IC-anchor that relates to a spe- 
cific bit in a bitmap. 
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[0134] For retrieving data (infoCourses) in response 
to a query the data storage system identifies the corre- 
sponding IC-anchors. Using the IC-anchor and the cor- 
responding InfoCourse tree a specific node in an Info- 
s Type tree can be found and the value can be read from 
the node. The value can then be copied, for example, 
to a Send-Buffer-Structure as described earlier. To lo- 
cate the specific bit that con^esponds to the identified IC- 
anchor the data storage system can use an algorithm 
10 that works with relative addresses. 

[0135] A specific bit is part of a machine word. As- 
sume that this specific bit is bit number K. The machine 
word has a memory address MWA. The whole bitmap 
has a start address SA. For example, the relative ad- 
f5 dress of the specific bit Is calculated as BA = (MWA - 
SA) * 64 + K for 64 Bit long machine words and BA = 
(l\/IWA - SA) * 32 + K for 32 Bit long machine words. At 
this memory location, the specific bit of the Bitmap can 
be checked. If it is set to 1 , the InfoCourse with the cor- 
responding IC-anchor is part of the result set. 
[0136] The IC-anchor can be found in the IC-anchor 
memory area in the following way. All IC-anchors reside 
in the IC-anchor memory area with the basis address C. 
The size AS of an IC-anchor is known. Therefore, the 
IC-anchor address AA of the specif k: IC-anchor can be 
calculated as AA = C + BA * AS. Therefore, a pointer 
that is set to the address AA points to the requested IC- 
anchor. 

[0137] When creating a result set bitmap in compli- 
ance with a portion of the Boolean expression, the result 

flag bits that relate to IC-anchors of the corresponding 
InfoCourses are set to 1 . This can also be achieved by 
using relative addresses. 

[0138] Each IC-anchor has a memory address AA. 
The basis address of the IC-anchor memory area is C. 
By knowing the size AS of an IC-anchor, the IC-anchor 
position number can be calculated as BA = (AA - C) / 
AS. That Is, the result flag bit for the BAth IC-anchor Is 
to be located in the bitmap memory area. The start ad- 
dress of a specific bitmap (result set identified by result 
ID) is SA. The machine word address MAW where the 
bit is located is calculated as MAW = SA + BA div 64 on 
a 64 bit hardware platform and MAW = SA + BA div 32 
on a 32 bit hardware platfomi, where the div operator 
divides one Integer number by another Integer number 
and returns the integer part of the result. Within the iden- 
tified machine word at MAW the Kth bit has to be set to 
1 with K = BA mod 64 on a 64 bit hardware platfomi and 
K = BA mod 32 on a 32 bit hardware platform, where 
the mod operator divides two integer numbers and re- 
turns only the remainder. Alternatively, K could also be 
calculated as: 

K=BA-(MAW-SA)*BS, 

where 2^ is the addressable number of bits in the used 
hardware platform/operating system. 
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[0139] FIG. 19 illustrates, how AND/OR/NOT opera- 
tors can be applied to the first and second bitmap BM- 
n, BM-n+2 by sequentially combining the corresponding 
pairs of machine words. The machine words are Illus- 
trated by cycles with a number that con^esponds to the s 
position of the machine word within Its bitmap. 
[0140] Machine word 1 of the first bitmap BM-n is 
combined with machine word 1' of the second bitmap 
BM-n+2. This Is repeated for all pairs of machine words 
(2,2'), (3,3'), and so on, with respect to the first and sec- io 
ond bitmaps. Since all IC-anchors are represented by a 
corresponding bit In each of the bitmaps, all bitmaps 
have the same size and, thus, contain the same number 
of machine words. 

[0141] The logical combination of pairs of machine is 
words by applying the Boolean AND or OR operators 
can be perfomned as a bit by bit AND or OR operation. 
Usually, the CPU (processor) can perfonn this In one 
processing cycle. Programming languages, such as 
C++, offer commands for this kind of bit by bit opera- 20 
tions. 

[01 42] The result of the combination of first and sec- 
ond result set bitmaps may be written to a new, third 
bitmap (e.g., BM-n+1) or to one of the two original bit- 
maps BM-n, BM-n+2. This depends on whetherthe orig- 25 
inal bitmaps may be overwritten or are to be kept for 
later use. 

[0143] After the processing of each pair of machine 
words the result flag counter counts how many bits are 
set to 1 In the resulting machine word. The sum of the so 
counting results for all machine words of the resulting 
bitmap corresponds to the total number of bits set to 1 
in the resulting bitmap and may be reported to an appli- 
cation as the number of hits. 

[0144] The application of the Boolean NOT operator 35 
to a bitmap Is performed as a bit by bit NOT operation 
applied to each machine word in the bitmap. Again, the 
result may overwrite the original bitmap or can be written 
to another bitmap if the original bitmap has to be kept 
for later use. 40 
[0145] The so far described bit map implementation 
may con-espond to the algorithm A(n + 1) (cf. FIG. 4) 
that has a better perfomriance than the algorithm An in 
case the number of hits pc exceeds the threshold value 
B(n) and the complexity of the Boolean expression Is in ^ 
the interval [B'(n - 1) , B'(n)]. 

[0146] The "Lean AND", as described under FIG. 16, 
only needs one result set and can also be implemented 
when using bitmaps. The bit map "Lean AND" imple- 
mentation may correspond to the algorithm A'(n + 1 ) (cf . so 
FIG. 4) that Is to be used when the complexity of the 
Boolean AND expression is in the interval [B'(n), B'(n + 
1 )] in the complexity dimension and the current number 
of hits is in the interval [B(n), B(n + 1)]. 
[01 47] For example, the Boolean expression Includes ss 
five conditions that are combined by Boolean AND op- 
erators: CI AND C2 AND C3 AND C4 AND C5. A result 
set bitmap is set up (as described in FIGs. 13, 14) for 
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the condition with the highest selectivity of all conditions 
included in the Boolean expression. Then the data stor- 
age system mns through the bitmap from the first to the 
last bit. For each bit that is set to 1 the data storage sys- 
tem jumps to the corresponding InfoCourse and checks 
if all other Conditions are fulfilled by the corresponding 
InfoCourse. This check is perfomned in the same way 
as described in the implementation using result flag in- 
stances (cf. FIG. 16). If all conditions are true the bit 
keeps Its value 1 , otherwise the bit Is set to 0. When a 
bit is set to 0, the corresponding result counter contain- 
ing the number of bits that are set to 1 is reduced by 1 . 
Therefore, the result counter contains always the cur- 
rent number of hits. 

[0148] Alternatively, instead of getting the number of 
hits from the initial bitmap and reducing the counter each 
time a bit is set to 0 when an InfoCourse does not match 
the other conditions, the counter may also count the 
number of bits that are set to 1 In the bitmap after the 
"Lean AND" has been applied. 
[0149] FIG. 20 is a simplified block diagram of soft- 
ware components of the computer system 990 that can 
be used with an embodiment of the invention to dynam- 
ically select a data retriever implementation in depend- 
ence of a specific environment. In the example, the data 
retriever implementations 111, 112, 113, and 114 are 
characterized by their respective algorithms An, A(n + 
1),A'n, and A' (n + 1). 

[0150] A result set may potentially contain millions of 
result flags given a sufficiently large number of Info- 
Courses loaded. Using the first or third data retriever 
1 1 1 , 1 1 3, for example, on a 64 bit architecture one point- 
er address occupies already 64 Bit (8 Bytes). Each re- 
sult flag has at least two pointers plus the content of the 
result flag. Therefore, one result flag may occupy sev- 
eral hundreds of bytes. In this case, one result set con- 
taining some millions of result flags occupies memory 
space in the range of up to several hundreds of mega- 
bytes. This is in addition to the memory space occupied 
by the tree structures that also reside in main memory. 
Further, as the data storage system using the first or 
third data retriever 111, 113 processes the result sets 
sequentially and checks result flag instance by result 
flag instance to perfomn AND/OR combinations, this 
may lead to processing times of several seconds for one 
combination when applied to very large result sets (e. 
g., several millions of result flag instances). One can ap- 
ply an appropriate parallelisation to the first Implemen- 
tation or use the second implementation to overcome 
these issues. 

[0151] The second and forth data retriever implemen- 
tations 112,114 use bitmaps for result sets. Bitmaps are 
a representation of result sets that consumes consider- 
ably less memory space than the result sets of the first 
and third data retrievers. Further, bit by bit Operations 
are performed very fast and can usually be handled in 
one CPU processing cycle per machine word, If support- 
ed by a programming language, such as C++. 
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[0152] For very large result sets, such as several mil- 
lions of hits (e.g., In the Interval [B(n), B{n - 1))), it can 
be more time saving to perform Boolean combinations 
of result sets using bitmaps instead of pointer lists of 
result flag instances. Already the Instantiation of millions 
of result flag instances may last several seconds If par- 
allellsation is not used. In addition the time for perform- 
ing the Boolean combination has to be considered. The 
Boolean combination of result flag Instances Is per- 
formed one by one at the instance level. 
[0153] However, when a result set contains only a 
small number of hits (e.g., In the Interval [B{n-1), B(n)]) 
then bitmaps may be "almost empty". That Is, only a 
small number of bits is set to 1 in a large number of ma- 
chine wonjs (e.g., for 5 million InfoCourses a bitmap In- 
cludes 78.125 machine words on a 64 bit platfomi). In 
a bad case only one bit is set to 1 in each machine word. 
Therefore, for small result sets the use of result flag In- 
stances may be advantageous. 
[0154] Within the corresponding intervals in the 
"number of hits" dimension the further performance de- 
pendency on the complexity of the Boolean statement 
exists. This dependency has already been explained 
under FIGs. 16 to 19. 

[01 55] The computer program product components in 
FIG. 20 allow the data storage system to switch from 
one data retriever implementation to another if the other 
Implementation uses an algorithm that is more advan- 
tageous in a specific environment. For example, this can 
be achieved by transforming pointer lists of the first im- 
plementation into bitmaps of the second implementation 
and vice versa. For these transf onnations the above ex- 
plained procedures for creating result flag instances and 
for creating bitmaps can be used. 
[0156] The data storage system decides by itself 
when it is appropriate to use an implementation that is 
useful for small result sets having a number of result 
flags below a threshold value (e.g., up to several thou- 
sand elements) or an implementation having a number 
of result flags above the threshold value. For each of 
these cases the data storage system further decides 
whether the corresponding "Lean AND" algorithm A'n, 
A'(n 4^ 1 ) or the more general algorithms An, A(n + 1 ) Is 
advantageously used. 

[01 57] This enables the data storage system to auto- 
matically select the algorithm, which is best in terms of 
memory consumption and performance in a specific sit- 
uation. If a cun'ent value of one dimension equals a 
threshold value of this dimension, then it is not impor- 
tant, whether the algorithm for the Interval above or be- 
low the threshold value is used because, preferably, the 
threshold value is defined as the breakeven points for 
the two algorithms. 

[0158] In the example, the threshold value in the 
"number of hits" dimension may be the break even point 
being defined as the number of result flags in a result 
set, where the use of result flag instances leads to the 
same system performance as the use of bitmaps. The 



data storage system can determine the threshold value 
dynamically, for example, by appropriate time measure- 
ments. Therefore, on a given technology platfomri for a 
given data volume, data value distribution, etc., the ap- 
5 propriate value for the threshold value can be used in a 
stable environment at ail times. 
[0159] The query generator 1 01 generates the query 
that includes the Boolean expression 500. For example, 
the query generator can be implemented on the front 
10 end computing device 901. The query generator 101 
can also be part of an application that runs on any other 
computing device of the computer system 990. 
[0160] Once the data storage system 902 receives 
the Boolean expression through a corresponding inter- 
*5 face, the result counter 1 02 determines the correspond- 
ing number of hits. Preferably, the result counter Is Im- 
plemented in the data storage system 902. 
[0161] The threshold evaluator 1 03 is able to perfomn 
multidimensional comparisons. In other words, the 
threshold evaluator can compare the cun^ent number of 
hits with the intervals [B(n-1), B(n)] and [B(n), B(n+1)] 
and, substantially simultaneously, the complexity of the 
Boolean expression with the intervals [B'(n-1), B*(n)) and 
[B' (n) , B'(n+1)]. FIG. 2 explains details about how to 
initialise the threshold values defining the intervals. 
[0162] In case the number of hits (result flags) is In 
the interval [B(n-1), B(n)] and the Boolean expression 
is a complex Boolean expression from the interval [B'(n- 
1), B'(n)], the data storage system uses the first data 
retriever 111. This case is illustrated by bold solid con- 
nection lines between the threshold evaluator and the 
first data retriever. 

[0163] In case the number of hits (result flags) is in 
the inten/al [B(n), B(n4-1)] and the Boolean expression 
is a complex Boolean expression from the interval [B'(n- 
1), B'{n)], the data storage system uses the second data 
retriever 1 1 2. This case is illustrated by bold dotted con- 
nection lines between the threshold evaluator and the 
second data retriever. 

[0164] In case the number of hits (result flags) is in 
the interval [B(n-1), B(n)] and the Boolean expression 
includes a "single-condition-Boolean-AND" expression 
from the interval [B'(n), B'(n+1)], the data storage sys- 
tem uses the third data retriever 113. This case is illus- 
trated by bold dashed connection lines between the 
threshold evaluator and the second data retriever. 
[0165] In case the number of hits (result flags) is in 
the interval [B(n), B(n+1)] and the Boolean expression 
includes a "single-condition-Boolean-AND" expression 
from the interval [B'(n), B'(n+1)], the data storage sys- 
tem uses the forth data retriever 114. This case is illus- 
trated by bold dotted-dashed connection lines between 
the threshold evaluator and the second data retriever 
[0166] The retrieval time measuring component 104 
of the data storage system can measure the time that is 
consumed by either data retriever implementation. 
[0167] The threshold calculator 1 05 can dynambally 
detemnine (re-calculate) the threshold values of the var- 
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ious dimensions on the basis of the tinne measurements 
with respect to the four data retriever implementations. 
The recalculated threshold values can be fed into the 
threshold evaluator 103 and used for the next query. 
[0168] In general, there can be more threshold values 
that con-espond to even more data retrievers for even 
more dimensions. That Is, there can be further depend- 
encies on further parameters that are considered by the 
threshold evaluator for selecting the appropriate data re- 
triever. 

[0169] Embodiments of the invention can be Imple- 
mented in digital electronic circuitry, or In computer 
hardware, flmnware, software, or in combinations of 
them. The invention can be implemented as a computer 
program product, I.e., a computer program tangibly em- 
bodied in an information carrier, e.g., in a machine-read- 
able storage device or in a propagated signal, for exe- 
cution by, or to control the operation of, data processing 
apparatus, e.g., a programmable processor, a compu- 
ter, or multiple computers. A computer program can be 
written in any form of programming language, including 
compiled or interpreted languages, and it can be de- 
ployed In any fonn, including as a stand-alone program 
or as a module, component, subroutine, or other unit 
suitable for use In a computing environment. A computer 
program can be deployed to be executed on one com- 
puter or on multiple computers at one site or distributed 
across multiple sites and Interconnected by a commu- 
nication network. 

[0170] Method steps of the invention can be per- 
formed by one or more programmable processors exe- 
cuting a computer program to perfomri functions of the 
Invention by operating on input data and generating out- 
put. Method steps can also be perfomied by, and appa- 
ratus of the Invention can be implemented as, special 
purpose logic circuitry, e.g., an FPGA (field programma- 
ble gate array) or an ASIC (application-specific integrat- 
ed circuit). 

[01 71 ] Processors suitable forthe execution of a com- 
puter program include, by way of example, both general 
and special purpose microprocessors, and any one or 
more processors of any kind of digital computer. Gen- 
erally, a processor will receive instructions and data 
from a read-only memory or a random access memory 
or both. The essential elements of a computer are at 
least one processor for executing instructions and one 
or more memory devices for storing instructions and da- 
ta. Generally, a computer will also include, or be opera- 
tively coupled to receive data from or transfer data to, 
or both, one or more mass storage devices for storing 
data, e.g., magnetk:, magneto-optical disks, or optical 
disks. Infonnatlon carriers suitable for embodying com- 
puter program instructions and data include all fomris of 
non-volatile memory, including by way of example sem- 
iconductor memory devices, e.g., EPROM, EEPROM, 
and flash memory devices; magnetic disks, e.g. , internal 
hard disks or removable disks; magneto-optical disks; 
and CD-ROM and DVD-ROM disks. The processor and 



the memory can be supplemented by, or incorporated 
in special purpose logic circuitry. 
[0172] To provide for interaction with a user, the in- 
vention can be implemented on a computer having a dis- 
5 play device, e.g., a cathode ray tube (CRT) or liquid 
crystal display (LCD) monitor, for displaying infomnation 
to the user and a keyboard and a pointing device, e.g., 
a mouse or a trackball, by which the user can provide 
input to the computer. Other kinds of devices can be 
w used to provide for interaction with a user as well; for 
example, feedback provided to the user can be any form 
of sensory feedback, e.g., visual feedback, auditory 
feedback, or tactile feedback; and Input from the user 
can be received in any form, including acoustic, speech, 
*5 or tactile Input. 

[0173] The invention can be implemented in a com- 
puting system that Includes a back-end component, e. 
g., as a data server, or that Includes a middleware com- 
ponent, e.g., an application server, or that includes a 
front-end component, e.g., a client computer having a 
graphical user interface or a Web browser through 
which a user can interact with an implementation of the 
invention, or any combination of such back-end, middle- 
ware, or front-end components. The components of the 
system can be Interconnected by any forni or medium 
of digital data communication, e.g., a communication 
network. Examples of communication networks include 
a local area network (LAN) and a wide area networic 
(WAN), e.g., the Inlemet. 

[01 74] The computing system can include clients and 
servers. A client and server are generally remote from 
each other and typically interact through a communica- 
tion network. The relationship of client and server arises 
by virtue of computer programs running on the respec- 
tive computers and having a client-server relationship 
to each other. 

[0175] Although an embodiment of the invention has 
been described in detail using a data storage system 
having a plurality of data retrieval algorithms, the Inven- 
tion is not limited to this embodiment. Rather, other soft- 
ware applications making use of the spirit of the inven- 
tion as broadly described by the claims are considered 
to be within the scope of the invention. 



Claims 

1. A computer implemented method for automatic 
software tuning comprising the steps of: 

calculating (410) at least one threshold value 
for at least one parameter (P1) influencing the 
perfomrtance of a software application (200) 
with regards to a specific task; 
comparing (430) the at least one threshold val- 
ue to at least one corresponding cun^ent value; 
and 

selecting (440) an algorithm (A1) from a plural- 
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ity of algorithms (A1 to AN) for performing the 
task in accordance with the result of the com- 
paring step (430). 

2. Method of claim 1 comprising the further steps of: 

measuring (450) the performance of the select- 
ed algorithm (A1); 

checking (460) whether the measured perform- 
ance complies with the at least one threshold 
value; and 

recalculating (470) the at least one threshold 
value in case of non-compliance. 

3. Method of any one of the previous claims, where 
the at least one threshold value separates the value 
range of the parameter (P1) into at least two inter- 
vals of a first dimension. 

4. Method of claim 3, wherein the selecting step (440) 
selects the algorithm (A1 ) that is assigned to the in- 
terval that includes the corresponding current value 
of the first dimension. 

5. Method of claim 3, where at least one further thresh- 
old value separates the value range of a further pa- 
rameter into at least two intervals of a second di- 
mension. 

6. Method of claim 5, wherein the selecting step (440) 
selects the algorithm (A1 ) that is assigned to the in- 
tersection of the interval of the first dimension that 
includes the corresponding current parameter val- 
ue of the first dimension and the interval of the sec- 
ond dimension that Includes the corresponding cur- 
rent parameter value of the second dimension. 

7. Method of any one of the claims 3 to 6, wherein each 
threshold value corresponds to a break-even point 
where two neighbouring algorithms have the same 
performance with respect to the corresponding di- 
mension. 

8. A computer program product for automatic software 
tuning comprising a plurality of instructions that 
when loaded Into a memory of a computer system 
(990) cause at least one processor of the computer 
system (900) to execute the steps of any one of the 
claims 1 to 7. 

9. Infomiation carrier comprising the computer pro- 
gram product of claim 8. 

1 0. A computer program product for dynamically select- 
ing a data retriever implementation for retrieving da- 
ta from a data storage system (902) In response to 
a Boolean expression (500) comprising: 



a result counter (1 02) to detemnine a number of 
hits in response to the Boolean expression; 
a threshold evaiuator (103) to compare the 
number of hits with a threshold value of a first 

5 dimension and to compare the complexity of 

the Boolean expression with a further threshold 
value of a second dimension; 
a first data retriever (1 1 1 ) to retrieve the data in 
case the number of hits is below the threshold 

10 value of the first dimension and the complexity 

of the Boolean expression is above the further 
threshold value of the second dimension; 
a second data retriever (1 1 2) to retrieve the da- 
ta in case the number of hits is above the 

IS threshold value of the first dimension and the 

complexity of the Boolean expression is above 
the further threshold value of the second di- 
mension; 

a third data retriever (113) to retrieve the data 
20 incase the numberof hits Is below the threshold 

value of the first dimension and the complexity 
of the Boolean expression is below the further 
threshold value of the second dimension; and 
a forth data retriever (114) to retrieve the data 
2s in case the number of hits Is above the thresh- 

old value of the first dimension and the com- 
plexity of the Boolean expression is below the 
further threshold value of the second dimen- 
sion. 

30 

1 1 . The computer program product of claim 1 0, further 
comprising: 

a retrieval time measuring component (104) to 
35 measure the time that is consumed by a select- 

ed data retriever (11 1 , 1 1 2, 1 1 3, 1 1 4) for various 
numbers of hits; and 

a threshold calculator (105) to dynamically de- 
temnine the threshold value and the further 
40 threshold value on the basis of the results of 

the retrieval time measuring component (1 04) 
and to feed back the detemriined threshold val- 
ues into the threshold evaiuator (103). 

45 12. The computer program product according to claim 
1 1 , where the first data retriever (1 1 1 ) is implement- 
ed by using a general data retrieval algorithm using 
result flag Instances. 

so 13. The computer program product according to claim 
11 or 12, where the second data retriever (112) is 
implemented by using a general data retrieval algo- 
rithm using bit maps. 

ss 14. The computer program product according to any 
one of the claims 11 to 13, where the third data re- 
triever (113) is implemented by using a lean AND 
data retrieval algorithm using result flag instances. 
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15. The computer program product according to any 
one of the claims 11 to 14, where the forth data re- 
triever (114) is implemented by using a lean AND 
data retrieval algorithm using bit maps. 

16. A computer system (990) comprising: 

a memory to store a computer program product 
according to any one of the claims 1 0 to 1 5; and 
at least one processor to execute instructions 
of the computer program product according to 
any one of the claims 10 to 15. 

17. A computer system (990) for running a software ap- 
plication (200) compristng: 

variables (210) for storing at least one thresh- 
old value for at least one parameter (PI) influ- 
encing the perfomnance of the software appli- 
cation (200) with regards to a specific task; and 
a th resh old eval uator (220) f o r com pari n g (430) 
the at least one threshold value to at least one 
corresponding current value allowing the soft- 
ware application (200) to select (440) an algo- 
rithm (A1) from a plurality of algorithms (A1 to 
AN) for perfomriing the task in accordance with 
the result of comparison. 

18. The computer system (990) of claim 17, further 
comprising: 

a threshold calculator (230) for recalculating 
(470) the at least one threshold value in case 
the actual perfomnance of the selected algo- 
rithm (A1 ) Is non-compliant with the at least one 
threshold value. 

19. The computer system (990) of claim 17 or 18, where 
the at least one threshold value separates the value 
range of the parameter (P1) into at least two inter- 
vals of a first dimension. 

20. The computer system (990) of daim 1 9, wherein the 
selected algorithm (A1) is assigned to the interval 
that Includes the corresponding cu rrent value of the 
first dimension. 

21. The computer system (990) of claim 19, where at 
least one further threshold value separates the val- 
ue range of a further parameter Into at least two in- 
tervals of a second dimension. 

22. The computer system (990) of claim 21 , wherein the 
selected algorithm (A1) is assigned to the Intersec- 
tion of the interval of the first dimension that that 
includes the corresponding current parameter val- 
ue of the first dimension and the interval of the sec- 
ond dimension that that includes the con-esponding 



current parameter value of the second dimension. 

23. The computer system (990) of any one of the claims 
1 9 to 22, wherein each threshold value con^esponds 
5 to a break-even point where two neighbouring algo- 
rithms have the same perfonnance with respect to 
the corresponding dimension. 

10 Amended claims in accordance with Rule 86(2) EPC. 

1 . A computer implemented method for automatic 
software tuning comprising the steps of: 

IS calculating (41 0) initially a first threshold value 

for a first parameter (PI ) and at least a second 
threshold value for at least a second parameter 
(P2), the parameters (P1, P2) influencing the 
performance of a software application (200) 
with regards to a specific task, wherein the first 
threshold value separates the value range of 
the first parameter (PI) Into two intervals of a 
first dimension and the at least second thresh- 
old value separates the value range of the at 
least second parameter into at least two inter- 
vals of a second dimension; 
comparing (430) the first threshold value to a 
corresponding current value of the first param- 
eter and the at least second threshold value to 
a corresponding current value of the at least 
second parameter; and 
selecting (440) an algorithm (A1 ) from a plural- 
ity of algorithms (A1 to AN) for performing the 
task in accordance with the result of the com- 
paring step (430), wherein the selected algo- 
rithm (A1) is assigned to the intersection of the 
interval of the first dimension that includes the 
corresponding current parameter value of the 
first dimension and the interval of the at least 
second dimension that Includes the corre- 
sponding current parameter value of the sec- 
ond dimension. 

2. Method of claim 1 comprising the further steps of: 

measuring (450) the perfomnance of the select- 
ed algorithm (A1); 

checking (460) whether the selected algorithm 
(A1 ) delivers the best performance within the 
plurality of algorithms (A1 to AN); and 
recalculating (470) at least one of the threshold 
values if a further algorithm of the plurality of 
algorithms performs better in the intersection 
including the cun*ent parameter values of the 
first and at least second dimensions, wherein 
the recalculation is perfomned so that the fur- 
ther algorithm gets automatically selected in 
the intersection defined by the at least one re- 
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calculated threshold value. 

3. Method of any one of the claims 1 to 2, wherein 
each threshold value corresponds to a break-even 
point where two neighbouring algorithms have the 
same performance with respect to the con^espond- 
ing dimension. 

4. A computer program product for automatic soft- 
ware tuning comprising a plurality of Instructions 
that when loaded into a memory of a computer sys- 
tem (990) cause at least one processor of the com- 
puter system (900) to execute the steps of any one 
of the claims 1 to 3. 

5. A computer system (990) for running a software 
application (200) comprising: 

variables (210) for storing afirst threshold value 
for a first parameter (P1 ) and at least a second 
threshold value for at least a second parameter 
(P2), the parameters (P1, P2) influencing the 
perfomriance of the software application (200) 
with regards to a specific task, wherein the first 
threshold value separates the value range of 
the first parameter (P1) into two intervals of a 
first dimension and the at least second thresh- 
old value separates the value range of the at 
least second parameter into at least two inter- 
vals of a second dimension, the first and at least 
second parameter values having initial values 
being set by running test cases for a plurality of 
algorithms (A1 to AN) for perfonming the spe- 
cific task; and 

a threshold evaluator (220) configured to com- 
pare (430) the first threshold value to a con-e- 
sponding current value of the first parameter 
and the at least second threshold value to a cor- 
responding current value of the at least second 
parameter, wherein the interval of the first di- 
mension that includes the corresponding cur- 
rent parameter value of the first dimension and 
the interval of the at least second dimension 
that includes the corresponding cun-ent param- 
eter value of the second dimension define an 
intersection, the intersection being used by the 
software application (200) to select (440) an al- 
gorithm (A1 ) assigned to the intersection from 
the plurality of algorithms (A1 to AN) for per- 
fomriing the task in accordance with the result 
of comparison. 

6. The computer system (990) of claim 5, further 
comprising: 

a threshold calculator (230) configured to recal- 
culate (470) at least one of the threshold values 
if a further algorithm of the plurality of algo- 
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rithms performs better in the intersection in- 
cluding the current parameter values of the first 
and at least second dimensions, wherein the 
recalculation is perfomied so that the further al- 
gorithm gets automatically selected In the Inter- 
section defined by the at least one recalculated 
threshold value. 

7. The computer system (990) of any one of the 
claims 5 to 6, wherein each threshold value corre- 
sponds to a break-even point where two neighbour- 
ing algorithms have the same perfomriance with re- 
spect to the con-esponding dimension. 
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